T-code compression for Arabic computational morphology
نویسندگان
چکیده
It is impossible to perform root-based searching, concordancing, and grammar checking in Arabic without a method to match words with roots and vice versa. A comprehensive word list is essential for incremental searching, predictive SMS messaging, and spell checking, but due to the derivational and inflectional nature of Arabic, a comprehensive word list is taxing on storage space and access speed. This paper describes a method for compactly storing and efficiently accessing an extensive dictionary of Arabic words by their morphological properties and roots. Compression of the dictionary is based on T-Code encoding, which follows the Huffman encoding model. The special characteristics inherent in the recursive augmentation method with which codes are created allow compact storage on disk and in memory. They also facilitate the efficient use of bandwidth, for Arabic text transmission, over intranets and the Internet.
منابع مشابه
Development of a compression system dynamic simulation code for testing and designing of anti-surge control system
In recent years, several research activities have been conducted to develop knowledge in analysis, design and optimization of compressor anti-surge control system. Since the anti-surge control testing on a full-scale compressor is limited to possible consequences of failure, and also the experimental facility can be expensive to set up control strategies and logic, design process often involves...
متن کاملArabic-document compression: A close look at group 3 international digital facsimile coding standards
Efficient bit-representation or compression of documents is an important issue in many applications. The amount of compression depends on the document contents such as written scripts, diagrams, tables, etc. The contents of the document determine the limit of this compression. In the CCITI" Recommendation T.4, 'Standardization of group 3 apparatus for document transmission', a modified Huffman ...
متن کاملExtending the Radar Dynamic Range using Adaptive Pulse Compression
The matched filter in the radar receiver is only adapted to the transmitted signal version and its output will be wasted due to non-matching with the received signal from the environment. The sidelobes amplitude of the matched filter output in pulse compression radars are dependent on the transmitted coded waveforms that extended as much as the length of the code on both sides of the target loc...
متن کاملFast Intra Mode Decision for Depth Map coding in 3D-HEVC Standard
three dimensional- high efficiency video coding (3D-HEVC) is the expanded version of the latest video compression standard, namely high efficiency video coding (HEVC), which is used to compress 3D videos. 3D videos include texture video and depth map. Since the statistical characteristics of depth maps are different from those of texture videos, new tools have been added to the HEVC standard fo...
متن کاملConventional Orthography for Dialectal Arabic
Dialectal Arabic (DA) refers to the day-to-day vernaculars spoken in the Arab world. DA lives side-by-side with the official language, Modern Standard Arabic (MSA). DA differs from MSA on all levels of linguistic representation, from phonology and morphology to lexicon and syntax. Unlike MSA, DA has no standard orthography since there are no Arabic dialect academies, nor is there a large edited...
متن کامل